Search CORE

7 research outputs found

Improving prefetching mechanisms for tiled CMP platforms

Author: Torrents Lapuerta Martí
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2016
Field of study

Recently, high performance processor designs have evolved toward Chip-Multiprocessor (CMP) architectures to deal with instruction level parallelism limitations and, more important, to manage the power consumption that is becoming unaffordable due to the increased transistor count and clock frequency. At the present moment, this architecture, which implements multiple processing cores on a single die, is commercially available with up to twenty four processors on a single chip and there are roadmaps and research trends that suggest that number of cores will increase in the near future. The increasing on number of cores has converted the interconnection network in a key issue that will have significant impact on performance. Moreover, as the number of cores increases, tiled architectures are foreseen to provide a scalable solution to handle design complexity. Network-on-Chip (NoC) emerges as a solution to deal with growing on-chip wire delays. On the other hand, CMP designs are likely to be equipped with latency hiding techniques like prefetching in order to reduce the negative impact on performance that, otherwise, high cache miss rates would lead to. Unfortunately, the extra number of network messages that prefetching entails can drastically increase power consumption and the latency in the NoC. In this thesis, we do not develop a new prefetching technique for CMPs but propose improvements applicable to any of them. Specifically, we analyze the behavior of the prefetching in the CMPs and its impact to the interconnect. We propose several dynamic management techniques to improve the performance of the prefetching mechanism in the system. Furthermore, we identify the main problems when implementing prefetching in distributed memory systems like tiled architectures and propose directions to solve them. Finally, we propose several research lines to continue the work done in this thesis.Recentment l'arquitectura dels processadors d'altes prestacions ha evolucionat cap a processadors amb diversos nuclis per a concordar amb les limitacions del paral·lelisme a nivell d'instrucció i, mes important encara, per tractar el consum d'energia que ha esdevingut insostenible degut a l'increment de transistors i la freqüència de rellotge. Ara mateix, aquestes arquitectures, que implementes varis nuclis en un sol xip, estan a la venta amb mes de vint-i-quatre processadors en un sol xip i hi ha previsions que suggereixen que aquest nombre de nuclis creixerà en un futur pròxim. Aquest increment del nombre de nuclis, ha convertit la xarxa que els connecta en un punt clau que tindrà un impacte important en el seu rendiment. Una topologia de xarxa que sembla que serà capaç de proveir una solució escalable per aquestes arquitectures ha estat la topologia tile. Les xarxes en el xip (NoC) es presenten com la solució del increment de la latència dels cables del xip. Per altre banda, els dissenys de multiprocessadors seguiran disposant de tècniques de reducció de latència de memòria com el prefetch per tal de reduir l'impacte negatiu en rendiment que, altrament, tindríem degut als elevats temps de latència en fallades a memòria cache. Desafortunadament, el gran nombre de peticions destinades a prefetch, pot augmentar dràsticament la congestió a la xarxa i el consum d'energia. En aquesta tesi, no desenvolupem cap tècnica nova de prefetching, però proposem millores aplicables a qualsevol d'ells. Concretament analitzem el comportament del prefetching en multiprocessadors i el seu impacte a la xarxa. Proposem diverses tècniques de control dinàmic per millor el rendiment del prefetcher al sistema. A més, identifiquem els problemes principals d'implementar el prefetching en els sistemes de memòria distribuïts com els de les arquitectures tile i proposem línies d'investigació per solucionar-los. Finalment, també proposem diverses línies d'investigació per continuar amb el treball fet en aquesta tesi.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Improving the prefetching performance through code region profiling

Author: Martínez Raul
Molina Clemente Carlos
Torrents Lapuerta Martí
Publication venue: Barcelona Supercomputing Center
Publication date: 05/05/2015
Field of study

In this work, we propose a new technique to improve the performance of hardware data prefetching. This technique is based on detecting periods of time and regions of code where the prefetcher is not working properly, thus not providing any speedup or even producing slowdown. Once these periods of time and regions of code are detected, the prefetcher may be switched off and later on, switched on. To efficiently implement such mechanism, we identify three orthogonal issues that must be addressed: the granularity of the code region, when the prefetcher is switched on, and when the prefetcher is switched off

UPCommons. Portal del coneixement obert de la UPC

Comparative Study of Prefetching Mechanisms

Author: Torrents Lapuerta Martí
Publication venue: Universitat Politècnica de Catalunya
Publication date: 23/09/2009
Field of study

UPCommons. Portal del coneixement obert de la UPC

Comparative Study of Prefetching Mechanisms

Author: Torrents Lapuerta Martí
Publication venue: Universitat Politècnica de Catalunya
Publication date
Field of study

RECERCAT

Improving the prefetching performance through code region profiling

Author: Martínez Raul
Molina Clemente Carlos
Torrents Lapuerta Martí
Publication venue: Barcelona Supercomputing Center
Publication date: 05/05/2015
Field of study

Network aware performance evaluation of prefetching techniques in CMPs

Author: Martinez Morais Raul
Molina Clemente Carlos
Torrents Lapuerta Martí
Publication venue: 'Elsevier BV'
Publication date: 01/06/2014
Field of study

This study focuses on the importance of quantifying the effect of prefetching on the interconnection network of a multiprocessor chip. This kind of microarchitectural effects are often quantified using simulators. However, if prefetching traffic in a CMP (Chip MultiProcessor) system is to be accurately evaluated, simulators should simulate not only the memory hierarchy module and the multicore system, but also the network-on-chip. Unfortunately, no open-source simulator is able to evaluate all these elements at the same time. This paper describes how to develop a prefetching module for the gem5 CMP simulator and how to integrate this into the Ruby memory system. Moreover, by using the infrastructure developed in this study, this paper shows the importance of taking the network effect in prefetching-related studies into account, in order for accurate results to be obtained: not doing so may lead to mistaken conclusions. For this purpose, we have carried out a detailed analysis of the behavior of three different prefetching engines, providing not only the typical statistics for instructions per cycle and the miss rate, but also specific network and prefetching statistics.Peer Reviewe

UPCommons. Portal del coneixement obert de la UPC

Network aware performance evaluation of prefetching techniques in CMPs

Author: Martinez Morais Raul
Molina Clemente Carlos
Torrents Lapuerta Martí
Publication venue
Publication date
Field of study

RECERCAT